88 research outputs found

    Incremental clustering of news reports

    Get PDF
    When an event occurs in the real world, numerous news reports describing this event start to appear on different news sites within a few minutes of the event occurrence. This may result in a huge amount of information for users, and automated processes may be required to help manage this information. In this paper, we describe a clustering system that can cluster news reports from disparate sources into event-centric clusters—i.e., clusters of news reports describing the same event. A user can identify any RSS feed as a source of news he/she would like to receive and our clustering system can cluster reports received from the separate RSS feeds as they arrive without knowing the number of clusters in advance. Our clustering system was designed to function well in an online incremental environment. In evaluating our system, we found that our system is very good in performing fine-grained clustering, but performs rather poorly when performing coarser-grained clustering.peer-reviewe

    The completeness of electronic medical record data for patients with type 2 diabetes in primary care and its implications for computer modelling of predicted clinical outcomes.

    Get PDF
    Background: Computer models predicting outcomes among patients with Type 2 Diabetes (T2D) can be used as disease management program evaluation tools. The clinical data required as inputs for these models can include annually updated measurements such as blood pressure and glycated haemoglobin (HbA1c). These data can be extracted from primary care physician office systems but there are concerns about their completeness. Objectives/methods: This study addressed the completeness of routinely collected data extracted from 12 primary care practices in Australia. Data on annual availability of blood pressure, weight, total cholesterol, HDL-cholesterol and HbA1c values for regular patients were extracted in 2103 and analysed for temporal trends over the period 2000 to 2012. An ordinal logistic regression model was used to evaluate associations between patient characteristics and completeness of their records. Primary care practitioners were surveyed to identify barriers to recording data and strategies to improve its completeness. Results: Over the study period completeness of data improved substantially from less than 20% for some parameters up to a level of approximately 80% complete, except for the recording of weight. T2D patients with Ischaemic Heart Disease were more likely to have their blood pressure recorded (OR 1.6, p=0.02). Practitioners’ responses suggest they were not experiencing any major barriers to using their electronic medical record system but did agree with some suggested strategies to improve record completeness. Conclusion: The completeness of routinely collected data suitable for input into computerised predictive models is improving although other dimensions of data quality need to be addressed

    Automatic classification of web pages into bookmark categories

    Get PDF
    We describe a technique to automatically classify a web page into an existing bookmark category whenever a user decides to bookmark a page. HyperBK compares a bag-of-words representation of the page to descriptions of categories in the user’s bookmark file. Unlike default web browser dialogs in which the user may be presented with the category into which he or she saved the last bookmarked file, HyperBK also offers the category most similar to the page being bookmarked. The user can opt to save the page to the last category used; create a new category; or save the page elsewhere. In an evaluation, the user’s preferred category was offered on average 67% of the time.peer-reviewe

    Achieving user adaptivity in hyperspace with hypercontext

    Get PDF
    HyperContext is a framework for adaptive and adaptable hypertext. In any hyperspace, each piece of information (e.g., contained in a document or node) is normally juxtaposed by other information via links. Two or more hypertext users may encounter the same document although they may have followed different paths to reach it. Those two users may well describe different aspects of the document as relevant to their needs and requirements. The HyperContext framework allows users to create different interpretations of information in context, which will also be available to future users.peer-reviewe

    COLLEGE : a collaborative on-line lecture environment for group and individual eLearning

    Get PDF
    COLLEGE is a platform for the development and delivery of interactive learning content for individual students or groups and will be built during 2005-2007. Phase I will deliver primarily video- and audio-based learning content together with tools to provide automated assistance and assessment of student progress. Phase II will increase the options for the learning content to include non-time-based media and will increase the level of Just- in-Time support for students. The COLLEGE toolset will be based around virtual metaphors corresponding to traditional tools for learning, recording, interacting with the source of the learning material, and assessment.peer-reviewe

    How did I find that : automatically constructing queries from bookmarked web pages and categories

    Get PDF
    We present ‘How Did I Find That?’ (HDIFT), an algorithm to find web pages related to categories of bookmarks (book- mark folders) or individual bookmarks stored in a user’s bookmark (or favorites) file. HDIFT automatically generates a query from the selected bookmarks and categories, submits the query to a third-party search engine, and presents the results to the user. HDIFT’s approach is innovative in that we select keywords to generate the query from a book- marked web page’s parents (other web-based documents that contain a link to the bookmarked web page), rather than from the bookmarked web page itself. Our initial limited evaluation results are promising. Volunteers who participated in the evaluation considered 20% of all query results to be relevant and interesting enough to bookmark. Additionally, 56.9% of the queries generated yielded results sets (of at most 10 results) containing at least one interesting and bookmarkable web page.peer-reviewe

    Risk factors associated with early smoking onset in two large birth cohorts.

    Get PDF
    We use prospective data from the ongoing British Cohort Study (BCS) and Millennium Cohort Study (MCS) to: 1) document changes in the prevalence of childhood smoking onset; 2) assess whether broad historic shifts in key risk factors, such as maternal education, parental smoking, and peer childhood smoking, explain observed cohort changes in childhood smoking; and 3) evaluate whether inequalities in onset have narrowed or widened during this period. The children in these two studies were born 31 years apart (i.e., BCS in 1970; MCS in 2001), and were followed from infancy through early adolescence (n = 23,506 children). Our outcome variable is child self-reports of smoking (ages 10, 11). Early life risk factors were assessed via parent reports in infancy and age 5. Findings reveal that the odds of childhood smoking were over 12 times greater among children born in 1970 versus 2001. The decline in childhood smoking by cohort was partly explained by increases in maternal education, decreases in mothers' and fathers' smoking, and declines in the number of children whose friends smoked. Results also show that childhood smoking is now more linked to early life disadvantages, as MCS children were especially likely to smoke if their mother had low education or used cigarettes, or if the child had a friend who smoked. Although the prevalence of child and adult smoking has dropped dramatically in the past three decades, policy efforts should focus on the increased social inequality resulting from the concentration of early life cigarette use among disadvantaged children

    Expanding query terms in context

    Get PDF
    Query expansion is normally performed using a thesaurus that is either generated from a collection of documents, or is otherwise language specific. We present a technique to discover associations between query terms that are synonyms based on past queries and documents common to multiple result sets, to enable query expansion to occur in context.peer-reviewe

    Comparing title only and full text indexing to classify web pages into bookmark categories

    Get PDF
    Web browser bookmark files are used to retain and organise records of web sites that the user would like to revisit. However, bookmark files tend to be under-utilised, as time and effort is needed to keep them organised. We use two methods to index and automatically classify documents referred to in 80 bookmark files, based on document title-only and full-text indexing, respectively. We evaluate the indexing methods by selecting a bookmark entry to classify from a bookmark file, and recreating the bookmark file so that it contains only entries created before the selected bookmark entry. Classification based on full-text indexing generally outperforms that based on document title only indexing. The ability to recommend the correct category at rank 1 using full-text indexing ranges from 20% to 41%, depending on the number of category members. However, combining the approaches results in a increase to 37% — 59%, but we would need to recommend up to two categories to users. By recommending up to 10 categories, this increases to 58% — 80%.peer-reviewe
    • …
    corecore